skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Indyk, Piotr"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2026
  2. Free, publicly-accessible full text available July 1, 2026
  3. Free, publicly-accessible full text available July 1, 2026
  4. Free, publicly-accessible full text available May 1, 2026
  5. The online list update problem is defined as follows: we are given a list of items and the cost to access any particular item is its position from the start of the list. A sequence of item accesses come online, and our goal is to dynamically reorder the list so that the aggregate access cost is small. We study the stochastic version of the problem where the items are accessed i.i.d. from an unknown distribution p. The study of the stochastic version goes back at least 60 years to McCabe. In this paper, we first consider the simple online algorithm which swaps an accessed item with the item right before it, unless it is at the very front. This algorithm is known as the Transposition rule. Wetheoretically analyze the stationary behavior of Transposition and prove that its performance is within 1 + o(1) factor of the optimal offline algorithm for access sequences sampled from heavy-tailed distributions, proving a conjecture of Rivest from 1976. While the stationary behavior of the Transposition rule is theoretically optimal in the aforemen tioned i.i.d setting, it can catastrophically fail under adversarial access sequences where only the last and second to last items are repeatedly accessed. A desirable outcome would be a policy that performs well under both circumstances. To achieve this, we use reinforcement learning to design an adaptive policy that performs well for both the i.i.d. setting and the above-mentioned adversarial access. Unsurprisingly, the learned policy appears to be an interpolation between Move-to-Front and Transposition with its behavior closer to Move-to-Front for adversarial access sequences and closer to Transposition for sequences sampled from heavy tailed distributions suggesting that the policy is adaptive and capable of responding to patterns in the access sequence. 
    more » « less
    Free, publicly-accessible full text available February 24, 2026
  6. The online list update problem is defined as follows: we are given a list of items and the cost to access any particular item is its position from the start of the list. A sequence of item accesses come online, and our goal is to dynamically reorder the list so that the aggregate access cost is small. We study the stochastic version of the problem where the items are accessed i.i.d. from an unknown distribution p. The study of the stochastic version goes back at least 60 years to McCabe. In this paper, we first consider the simple online algorithm which swaps an accessed item with the item right before it, unless it is at the very front. This algorithm is known as the Transposition rule. We theoretically analyze the stationary behavior of Transposition and prove that its performance is within 1+o(1) factor of the optimal offline algorithm for access sequences sampled from heavy-tailed distributions, proving a conjecture of Rivest from 1976. While the stationary behavior of the Transposition rule is theoretically optimal in the aforementioned i.i.d setting, it can catastrophically fail under adversarial access sequences where only the last and second to last items are repeatedly accessed. A desirable outcome would be a policy that performs well under both circumstances. To achieve this, we use reinforcement learning to design an adaptive policy that performs well for both the i.i.d. setting and the above-mentioned adversarial access. Unsurprisingly, the learned policy appears to be an interpolation between Move-to-Front and Transposition with its behavior closer to Move-to-Front for adversarial access sequences and closer to Transposition for sequences sampled from heavy tailed distributions suggesting that the policy is adaptive and capable of responding to patterns in the access sequence. 
    more » « less
    Free, publicly-accessible full text available February 24, 2026
  7. We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution p, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor’s quality, measured by its total variation distance from p. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimation’s accuracy (i.e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i.e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach. 
    more » « less
    Free, publicly-accessible full text available December 10, 2025
  8. Free, publicly-accessible full text available December 1, 2025
  9. Metric embeddings traditionally study how to map n items to a target metric space such that distance lengths are not heavily distorted. However, what if we are only interested in preserving the relative order of the distances, rather than their exact lengths? In this paper, we explore the fundamental question: given triplet comparisons of the form “item i is closer to item j than to item k,” can we find low-dimensional Euclidean representations for the n items that respect those distance comparisons? Such order-preserving embeddings naturally arise in important applications—such as recommendations, ranking, crowdsourcing, psychometrics, and nearest-neighbor search—and have been studied since the 1950s under the name of ordinal or non-metric embeddings. Our main results include: Nearly-Tight Bounds on Triplet Dimension: We introduce the concept of triplet dimension of a dataset and show, surprisingly, that in order for an ordinal embedding to be triplet-preserving, its dimension needs to grow as n^2 in the worst case. This is nearly optimal, as n−1 dimensions always suffice. Tradeoffs for Dimension vs (Ordinal) Relaxation: We relax the requirement that every triplet must be exactly preserved and present almost tight lower bounds for the maximum ratio between distances whose relative order was inverted by the embedding. This ratio is known as (ordinal) relaxation in the literature and serves as a counterpart to (metric) distortion. New Bounds on Terminal and Top-k-NNs Embeddings: Moving beyond triplets, we study two well-motivated scenarios where we care about preserving specific sets of distances (not necessarily triplets). The first scenario is Terminal Ordinal Embeddings where we aim to preserve relative distance orders to k given items (the “terminals”), and for that, we present matching upper and lower bounds. The second scenario is top-k-NNs Ordinal Embeddings, where for each item we aim to preserve the relative order of its k nearest neighbors, for which we present lower bounds. To the best of our knowledge, these are some of the first tradeoffs on triplet-preserving ordinal embeddings and the first study of Terminal and Top-k-NNs Ordinal Embeddings. 
    more » « less